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Abstract 

Background: Recent global genomic analyses identified 69 gene sets and 12 core signaling pathways genetically 
altered in pancreatic cancer, which is a highly malignant disease. A comprehensive understanding of the genetic 
signatures and signaling pathways that are directly correlated to pancreatic cancer survival will help cancer 
researchers to develop effective multi-gene targeted, personalized therapies for the pancreatic cancer patients at 
different stages. A previous work that applied a LASSO penalized regression method, which only considered 
individual genetic effects, identified 12 genes associated with pancreatic cancer survival. 

Results: In this work, we integrate pathway information into pancreatic cancer survival analysis. We introduce and 
apply a doubly regularized Cox regression model to identify both genes and signaling pathways related to 
pancreatic cancer survival. 

Conclusions: Four signaling pathways, including Ion transport, immune phagocytosis, TGF/3 (spermatogenesis), 
regulation of DNA-dependent transcription pathways, and 15 genes within the four pathways are identified and 
verified to be directly correlated to pancreatic cancer survival. Our findings can help cancer researchers design new 
strategies for the early detection and diagnosis of pancreatic cancer. 




Background 

Pancreatic cancer [1] is a devastating disease with a very 
poor prognosis and a five-year survival rate around 3-5%. 
The most common form of pancreatic cancer is the 
pancreatic ductal adenocarcinoma (PDAC, a malignant 
exocrine cancer). In the past 30 years, no substantial pro- 
gress has been made in PDAC diagnosis and treatment. 
New techniques and methods to investigate the dynamics 
of PDAC are urgently needed. Modern microarray tech- 
nology has revolutionized the way that we study the com- 
plex biological systems, allowing pancreatic cancer 
researchers to make genome-wide expression profiling 
and measure other features for patients in a fast, precise, 
and cost-effective way. One aim of systems biologists is 
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to correctly decipher and interpret the high-dimensional 
complex gene expression data, that is, to identify the key 
genetic signatures and signaling pathways implicated in 
the diseases. 

Pancreatic cancer is characterized by rapid growth, early 
local and distant invasion, interactions with stromal cells 
(e.g., pancreatic stellate cells) [2] and fibrous tissue, and a 
high resistance to chemotherapy and radiotherapy. The 
evolution of pancreatic cancer is partially stimulated by 
the overexpression of several growth factors, cytokines, 
and genetic alterations [3,4] at different stages of PDAC. 
Recent global genomic analyses identified 69 gene sets and 
12 core signaling pathways genetically altered in the pan- 
creatic cancer [1]. Most of the previous genomic analyses 
and microarray studies focused on the identification of the 
differentially expressed and metastasis-associated genes at 
different stages of pancreatic cancer [3,5], ignoring an 
important clinical factor - survival time. Stratford et al.'s 
work identified six genetic signatures [6] associated with 
metastatic pancreatic cancer using a sequence of statistical 
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techniques, including the significance analysis of microar- 
ray (SAM) [7], centroid-based predictor [8], Pearson corre- 
lation, X-Tile [9], Kaplan-Meier estimator [10] and Cox 
model [11]. Though these genes could help discriminate 
high- and low-risk patients, the prediction was not based 
on survival time. A comprehensive understanding of the 
genetic signatures and signaling pathways that are directly 
correlated to pancreatic cancer survival will help cancer 
researchers to develop effective multi-gene targeted, perso- 
nalized therapies for the pancreatic cancer patients at dif- 
ferent stages and improve survival rate. 

The Cox proportional hazards model [11] is the most 
popular survival model used to describe the relationship 
between the patient's survival time and predictor variables. 
When we have high-dimensional data (e.g. in a microarray 
study) where the number of predictors (genes) far exceeds 
the number of subjects (patients), the Cox model cannot 
be fitted directly unless the high-dimensionality is properly 
handled. The regularization approach has been widely 
used to select important variables from a large pool of 
candidate variables [12-14]. For example, a Lasso (least 
absolute shrinkage and selection operator) penalty can be 
imposed to individual variables to automatically remove 
unimportant ones by shrinking their regression coeffi- 
cients to be exactly zero [15]. In our previous work [16], 
we applied a lasso penalized Cox regression method, 
for the first time, to investigate the signature genes that 
are correlated to the pancreatic cancer survival time. We 
identified 12 genes associated with the pancreatic cancer 
survival and eight of them have been confirmed to be 
genetically altered and differentially expressed in the can- 
cer of gastric, colorectal, ovarian, breast, skin, kidney, 
colon, lung, and pancreatic in in vivo and in vitro experi- 
ments [17-25]. It has been shown that these survival- asso- 
ciated genes can also help to grade the stage and estimate 
the survival time of the PDAC patients. 

However, the genes may perform as groups rather than 
individuals since some genes belong to the same path- 
ways and get involved in the same biological processes. 
The pathway information is biologically important to our 
understanding of gene regulatory networks and cancer 
development [1]. The previous work [16] performs gene 
selection based on the strength of individual genes solely 
and ignores the information of signaling pathways. 
Recently, several variable selection methods have been 
introduced to consider group effects. For example, the 
group lasso method penalizes the i 2 -norm (Euclidean 
norm) of the coefficients within each group in linear 
regression [26] and Cox proportional hazards model [12]. 
Based on the boosting technique, a group additive regres- 
sion model [27] and a nonparametric pathway-based 
regression model [28] were developed to identify groups 
of genomic features that are related to several clinical 
phenotypes, including the survival outcome. However, 



those group selection methods only conduct "group 
selection" without "within-group selection", since they 
select variables in an "all-in-or-all-out" fashion. That is, if 
one variable in a group is selected, all the other variables 
in the same group will also get selected. 

Although pathways as a whole are involved in the 
development of pancreatic cancer, according to the glo- 
bal genomic analyses, not all the genes in the same path- 
way are involved in the process. In this work, we employ 
a doubly regularized Cox (DrCox) regression model [29] 
that integrates both genes and signaling pathways for the 
pancreatic cancer survival analysis. Both non-overlap and 
overlap cases of DrCox are considered. Cyclic coordinate 
descent algorithms are derived for parameter estimation. 
We analyze the high-dimensional microarray data of pan- 
creatic cancer patients with localized and resected PDAC 
collected between 1999 and 2007 [6] using DrCox. Four 
signaling pathways, including Ion transport, immune 
phagocytosis, TGF/3 (spermatogenesis), regulation of 
DNA-dependent transcription pathways, and 15 genes 
within these four pathways are identified and verified to 
be directly correlated to pancreatic cancer survival. Com- 
pared with other methods, the DrCox model can provide 
more accurate and useful prediction of survival time [29]. 
These findings can help cancer researchers design new 
strategies for the early detection and diagnosis of pan- 
creatic cancer at different stages. 

Methods 

In this section, we describe the doubly regularized Cox 
(DrCox) regression and derive the parameter estimates via 
cyclic coordinate descent algorithms. We first present the 
case where the groups do not overlap, i.e., each variable 
belongs to only one group. Then we discuss the overlap 
case, i.e., variables are allowed to belong to multiple groups. 

Doubly regularized Cox (DrCox) regression for non- 
overlap cases 

Assume that the p variables (genes) occur in K groups 
(pathways). We further assume the Ath group has p^ vari- 
ables and denote the p k variables in the Ath group by 
X(k) = (Xki> ■ ■ ■ > X kpk ) T , with the corresponding regression 
coefficients jS^j = (fiki> • • • > Pk P k) T - For a sample of n sub- 
jects, let Ti and Q denote the survival time and the cen- 
soring time for subject i = 1, . . . , n. The observed survival 
time is defined by Y, = min{r„ C,j and the censoring indi- 
cator is <5, = I(Ti < Q). The p predictor variables of the 
ith subject is denoted by X{ = [X.,^, Xj^) T , where 
Xi,(k) = (Xya, . . . , Xij ipk ) T . The survival time T t and the 
censoring time C; are conditionally independent given X[. 
The censoring mechanism is assumed to be noninforma- 
tive. The observed data can be represented by the triplets 
{(Y i ,S i ,X i ),i=l,...,n}. 
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The Cox proportional hazards model [11] composed 
of p genes and K pathways is written by 

/ K Pk \ IK 

h(t\X) = ho(t) exp [J2 E ^X kj = hoW exP E^t^M 

\k=l j=l / \fe=l 

where £]fe=i Pk = P- The partial likelihood of the Cox 
model is 



up) = n 



exp(Eti x r w %) 
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where D is the set of indices of observed failures, and 
Rj is the set of indices of the subjects who are at risk at 
time Y t . 

To achieve the goal of both group and within-group 
variable selection and to overcome the non-convexity 
drawback, the doubly regularized Cox regression model 
imposes a mixture of lasso penalty and group lasso pen- 
alty to the log-partial likelihood £ n (p) = log L n {fi) 



K pk 



k=l j=l 

K 



(1) 
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fe=l fc=l 
where || /S(fe)||i = Ejfi |$g| is the lasso penalty on indi- 
vidual parameters, || p m \\ 2 = Pkj is the g rou P 
penalty on groups of parameters, and X\ and A 2 are two 
nonnegative tuning constants controlling the strength of 
variables selection. The larger are the tuning constants, 
the fewer variables are retained in the model. In this 
paper, the value of the tuning constants are determined 
using £-fold cross validation (data-driven) technique to 
select a subset of relevant genes and signaling pathways 
for accurate and robust prediction. 

Coordinate descent for non-overlap cases 

Since there are more predictor variables than subjects 
(p >«), to tackle the high-dimensionality problem we 
use a cyclic coordinate descent algorithm, which has 
been shown to be computationally efficient [30-33]. The 
idea is to break a large optimization problem into a 
sequence of small ones. In other words, instead of esti- 
mating all the parameters at the same time, we can 
update each parameter one by one. Readers can refer to 
[31,32] for more details. 

In the non-overlap case, where each variable belongs to 
only one group, estimation of parameters and selection of 
important variables can be conducted via the minimization 
of (1) iteratively w.r.t. one parameter by one parameter. 



The first step is to calculate the forward and backward 
directional derivatives of each parameter. If is the coor- 
dinate direction along which Pq varies, then the forward 
and backward directional derivatives of P k j are 



lim 

40 



(A 1+ A 2 )(-l)'^ <0 > 



if II Pmh-o 
if II Pmfo>o, 



and 



t 



= m, lnm + \ M-l) i(A < >0) -A 2 ^if || ft„ll2>0, 

where /(•) is an indicator function equal to 1 if the 
condition in the parentheses is satisfied and 0 otherwise, 
and 



Ei eR( e^)(Efc.i- x i(k)A*))«l 
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After obtaining the directional derivatives, we then 
need to decide which parameters to be updated and the 
direction for updating. If both of the directional deriva- 
tives d ekj g(P) and d- ekj g(f3) are nonnegative, then the 
update for p k j is skipped. If either directional derivative 
is negative, then we solve for the minimum along the 
corresponding direction. It is impossible for both direc- 
tional derivatives to be negative due to the convexity of 
g(p). After identifying the direction to update the para- 
meter, one can use Newton's method to solve for the 
minimum. The update at iteration m + 1 is given by 



9ft., 



-f.(r)*li(-l)'"< < ° 1 i rr n'tf" <0 )r ir" i 5- f rn m » 



where j3w is the estimate at iteration m, h{) = /(II • ll 2 = 0), 
and/ 2 Q =/(ll -ll 2 >0). 

DrCox regression via coordinate descent for overlap cases 

However, in reality, one gene can get involved in different 
pathways. To consider overlapping, we modify the nota- 
tion and objective function (1). We denote the p variables 
by X lt . . . , X p with the corresponding regression coeffi- 
cients Pi, ... , P P . Let Vfc £ {1, 2, . . . , p] be the set of 
indices of variables in the kth group. The objective func- 
tion designed for the overlap case can be written as 



i n [p) + ^e^i +^ E /E^ 2 - 

j=l k=l V j'eVt 



(2) 
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Note that predictor Xi can belong to several pathways 
but it is only associated with one coefficient /3y. 

The parameter estimation needs to be modified 
accordingly. If we consider the coordinate direction e ; 
for Pp the forward and backward directional derivatives 
of Pj are 

= IjJJ j 

ti\ II fella 



and 



<i_^(«=lim 



= -d_,4(« + M(-i)'W>°) + x 2 £{(-i)'<^°>Mfe) - -A-fcCftk))}. 

fee, I II felb J 

where G ; £ {1, 2, . . . , K} are the indices of groups 
that Xj belongs to. After determining the direction for 
updating, the coefficient can be updated by 



Results and discussion 

The DrCox model with the cyclic coordinate descent 
algorithm is applied to analyze the PDAC data collected 
between 1999 and 2007. The aim of this work is to 
identify core signaling pathway sets and genetic signa- 
tures within those pathways related to pancreatic cancer 
survival. The microarray data of pancreatic cancer 
include 102 samples [6], which are publicly available at 
Gene Expression Omnibus (access code 21501). Accord- 
ing to [6], among these 102 PDAC patients, 66 died at 
the end of the study (censoring rate 35%). The survival 
time ranges from 1 month to 5 years. The Kaplan-Meier 
curve is plotted in Figure 1 to show the probability of 
survival in 5 years for the 102 PDAC patients. Each step 
means an actual event happens, i.e. a pancreatic cancer 
patient dies. A short vertical line without a drop means 
a patient gets censored for different reasons, drops off 
the study or the study ends. Additionally, two stage vari- 
ables, T stage and N stage, are given to describe the 
stages of pancreatic cancer, where T stage describes the 
size of the primary tumor ranging from 1 to 4 and N 
stage describes the spread to nearby (regional) lymph 
nodes with values 0 or 1. 

The whole dataset is randomly split into the training, 
validation, and testing sets with equal sizes. The training 
set is used for model fitting, and the validation set is 
used for tuning constants selection. Using the 3-fold 
cross-validation, we got the optimal values of Aj = 0.3 
and A 2 = 0.1, which minimize the log-partial likelihood 
function. Figure 2 shows the 3-D plots of the log-partial 



likelihood function and the number of selected genes vs. 
(Ai, A 2 ), respectively. Under the optimal tuning con- 
stants, 4 pathways and 15 genes are selected from the 
pool of 12660 probes of 6910 genes in 130 pathway sets 
organized in [1], which belong to 15 core groups in the 
pancreatic cancer studies. The selected pathways include 
the pathways of "regulation of DNA-dependent tran- 
scription" (6 out of 2096 genes are selected), "Ion trans- 
port" (7 out of 555 genes are selected), "immune 
phagocytosis" (1 out of 215 genes is selected), and 
"TGF/J (spermatogenesis)" (1 out of 268 genes is 
selected) pathways. These identified pathways and genes 
are biologically meaningful and consistent with the 
existing scientific findings. In particular, three genes - 
ZNF233, SLC22A8, and PCYT1B - were identified in 
the previous work [16] using a Lasso penalized Cox 
model when considering gene signatures only. 

♦ Regulation of DNA-dependent transcription path- 
way is well-known to be related to the development of 
cancer. It regulates the frequency and rate of cellular 
DNA-dependent transcription. This work identified 
three families of six genes that are related to pancreatic 
cancer survival. The six genes are DENND4A, KLF13, 
ZNF229, ZNF233, ZNF395, and ZNF432. 

- DENND4A is a c-myc promoter-binding protein 
[34], which mediates signal transduction in the 
nucleus and regulate the DNA replication and tran- 
scription. DENND4A can also activate the RAB10 
protein, which is a key regulator of polarized sorting 
in epithelial cells, from an inactive GDP-bound form 
to an active GTP-bound form through promoting 
GDP -» GTP exchange. 

- KLF13 belongs to the KLF family of transcription 
factors for several oncogenes and tumor suppressor 
genes [35,36] and it plays an important role in the 
tumor progression [36]. Recent study shows that 
KLF13 is overexpressed in the oral cancer cells. Inhi- 
biting KLF13's expression can decrease the prolifera- 
tion of cancer cell and increase its sensitivity to 
ionizing radiation [36]. In pancreatic cancer, KLF13 
can suppress the cell growth and neoplastic transfor- 
mation mediated by K-RAS, which is mutated in 
more than 90% of pancreatic tumors [35]. Our work 
suggests that KLF13 may be a useful biomarker for 
early detection and possible targets for the pancrea- 
tic cancer therapy. 

- Zinc finger protein family members: ZNF229, 
ZNF233, ZNF395 and ZNF432 are DNA-binding 
protein domains consisting of zinc fingers. Many of 
these zinc finger proteins, including ZNF233 (also 
identified in the previous work [16]), have been found 
to be associated with the abnormality of chromosome 
19 in the studies of kidney [23] and pancreatic 
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Figure 1 Kaplan-Meier curve of survival probability and 95% confidence interval for 102 PDAC patients The Kaplan-Meier survival curve 
(solid line) describes the probability of survival for the 102 PDAC patients. The dashed lines represent the 95% confidence interval. The 
horizontal axis represents the survival time (in months). Each step means an actual event happens, i.e. a pancreatic cancer patient dies. A short 
vertical line without a drop means a patient get censored for different reasons, e.g. drops off the study or the study ends. 



cancers [1]. Our analysis reveals that Zinc finger pro- 
teins and the corresponding pathway might be asso- 
ciated with the survival of pancreatic cancer. 

♦ Ion transport pathway plays integral roles in the 
development of cancer. Since the plasma membrane ion 
channels contribute to all basic cellular process [37,38], 
many ion channels are implicated in the uncontrolled pro- 
liferation, decreased apoptosis, and unorganized angiogen- 
esis. According to [37], the ion channels also contribute to 
the six hallmarks of cancer [39]: "1) self-sufficiency in 
growth signals, 2) insensitivity to antigrowth signals, 3) 
evasion of programmed cell death (apoptosis), 4) limitless 
replicative potential, 5) sustained angiogenesis and 6) tis- 
sue invasion and metastasis." 

We identified seven genes from three different channels 
or families, including the TRP channel (TRPV5 and 
TRPM6) regulating the transcellular Ca 2+ transport, 
KCNK channel (KCNK3 and KCNK18) regulating the IC 
transport, and solute carrier (SLC) family (SLC22A8, 
SLC8A3, and SLC24A6). Recent experimental studies have 



indicated that these three families play important roles in 
the cancer development. 

- TRP (Ca 2 +) channel and TRPV5, TRPM6 genes 

regulate the Calcium-mediated signal transduction 
that is frequently altered in cancer [40]. Several 
genes in TRPV channel have been detected to be 
up-regulated in prostate, colon, and breast cancer 
cells [40-42]. Particularly, TRPV5 and TRPV6 genes 
exhibit unusually high levels of single nucleotide 
polymorphisms (SNPs) in African populations as 
compared to other populations [41]. Moreover, the 
genes TRPM6 and TRPM7 in the TRPM channel 
can enhance the secretion of angiogenic factors, for 
example VEGF [40], resulting in a sustained unorga- 
nized angiogenesis process. The TRP channel and 
TRPV5, TRPM6 genes identified in pancreatic can- 
cer survival data could be possible targets for the 
future cancer diagnosis and treatment. 

- KCNK (K+) channel and KCNK3, KCNK18 
genes regulate the potassium (K+) transport and 
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values are X } = 0.3 and k 2 = 0.1; (B) shows the 3-D plot of the number of selected genes with nonzero regression coefficients vs. 



membrane potential (Vm) in response to different 
physical and chemical factors [38,40]. Several KCNK 
channel genes, for example, KCNK9 [43], are overex- 
pressed in breast and lung cancers, and the gene 
KCNK2 can promote prostate cancer cell's growth 
[40,44]. 



- SLC family: SLC22A8, SLC8A3, SLC24A6 are 
membrane transport proteins that are involved in 
the transport and excretion of many organic ions, 
drugs and toxicants. Some genes in SLC family are 
cancer-related, for example, SLC43A2 whose overex- 
pression is associated with the adenocarcinomas and 
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squamous cell carcinoma [45], which was identified 
in the previous work [16]. 

♦ Immune phagocytosis pathway and CYBA gene: 

One prominent hallmark feature of cancer is the evasion 
of immune destruction [39]. The immune system is 
important in preventing tumor initiation and controlling 
tumor growth through identifying and eliminating the 
cancer cells (i.e., tumor immune surveillance) [46]. 
Macrophages and other phagocytic cells are important 
players in the innate immune system whose functions 
include phagocytosis (homeostatic cell clearance), antigen 
presentation (pathogen defense), and cytokine produc- 
tion (inflammatory responses). Recent evidence [46-48] 
revealed that the active immune phagocytosis pathway 
could inhibit tumor growth through phagocytic clear- 
ance, i.e., programmed cell removal in clearing damaged 
and foreign cells. The CYBA gene is a tumor suppressor 
[49], which regulates the immune system cells - phago- 
cytes, involved in autophagy. The phagocytosis and 
superoxide production is primary regulated by the cyto- 
chrome b- 245, (light) alpha subunit (also known as 
p22 phox ), which is encoded by the gene CYBA. CYBA's 
mutation will cause the failure of phagocytosis and 
immune defects [50]. This observation supports our 
prediction that the immune phagocytosis and tumor 
suppressor gene CYBA might be associated with pancreatic 
cancer survival and tumor immune evasion. Targeting this 
pathway might lead to effective cancer immunotherapies. 

♦ TGF/J core pathway (spermatogenesis signaling 
set) and PCYT1B gene: The transforming growth factor 
beta (TGF/3) signaling pathway is critical in regulating 
many cellular processes, including the cell growth, differ- 
entiation and apoptosis. It has genetic alterations in 100% 
of pancreatic cancers [1]. The gene PCYT1B (phosphate 
cytidylyl transferase 1 choline j3) was identified to be 
associated with pancreatic cancer survival, which is con- 
sistent with the previous work [16]. The expression of 
PCYT1B is frequently deregulated in cancer cells of 
epithelial ovarian [21], high grade gliomas [51], and pan- 
creatic ductal adenocarcinoma [22]. Moreover, PCYT1B 
is a key regulator in the choline phospholipid metabo- 
lism, which is altered in the cancers of breast [19], colon 
[20], ovarian [21], and gliomas [51]. These observations 
support our prediction that PCYT1B and TGF/3 pathway 
are correlated with pancreatic cancer survival and they 
might help to grade the stage of pancreatic cancer 
patients. 

Compared with the previous work [16], which selected 
12 survival-relevant genes using a Lasso penalized Cox 
model, the DrCox model identified 4 pathways and 15 
genes related to pancreatic cancer survival. We divide 
the patients into long- and short-survival groups based 
on the selected pathways and genes and conduct the 



logrank test to compare the two groups. The survival 
probabilities of these two groups are plotted in the 
Figure 3. The logrank test gives a p-value of 0.0179, 
which means the two groups can be well separated and 
our finding of 4 pathways and 15 genes is significant. 

Conclusions 

In this work, we employed the doubly regularized Cox 
(DrCox) regression coupled with the coordinate descent 
algorithm to analyze the high-dimensional gene expres- 
sion data of patients with localized and resected PDAC. 
Different from the previous work [16], this DrCox model 
can incorporate both gene and pathway information and 
simultaneously infer genetic signatures and important 
signaling pathways that are related to the pancreatic can- 
cer survival. The proposed cyclic coordinate descent 
algorithm can quickly remove irrelevant genes and sig- 
naling pathways, so the prediction of survival time is 
more accurate and robust than other methods. Other 
group selection models select variables in an "all-in-or- 
all-out" fashion with no within-group selection, that is, if 
one variable in a group (pathway) is selected, all the 
other variables in the same group will get selected. For 
example, if gene PCYT1B in the TGF/3 pathway is 
selected, all the rest of genes in the TGF/f pathway will 
be selected as well. However, not all the genes in the 
TGF/3 pathway are involved in the development of pan- 
creatic cancer. The advantage of our DrCox method is 
that it can conduct both group selection and within- 
group selection simultaneously and eliminate the 
irrelevant. 

This work identified four signaling pathways, including 
Ion transport, immune phagocytosis, TGF/J (spermato- 
genesis), regulation of DNA-dependent transcription 
pathways, and 15 genes within these four pathways, 
which are directly correlated to pancreatic cancer survi- 
val. Pancreatic cancer patients with these deregulated sig- 
naling pathways and mutated genes might have a shorter 
survival time. Several inferred signaling components have 
been confirmed to be altered frequently in the cancer of 
pancreatic, oral, prostate, colon, breast and lung in the 
in vivo or in vitro experiments. Our finding predicts that, 
the TRP (Ca 2 +) channel-related genes (TRPV5 and 
TRPM6) and KCNK (K+) channel-related genes in the 
ion transport pathway are possible biomarkers of pan- 
creatic cancer survival. The Immune phagocytosis path- 
way with the tumor suppressor CYBA gene, which 
regulates the immune system cells and autophagy 
through phagocytic clearance, have not received enough 
attention in the existing pancreatic cancer research litera- 
ture. The gene PCYT1B in the TGF/J pathway is fre- 
quently deregulated in cancer cells compared with 
normal cells, which might help to grade the stage of pan- 
creatic cancer patients. The KLF13 in the regulation of 
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Logrank Test (p-value=0.0179) 
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Figure 3 Logrank test of the long- and short-survival groups based on the 4 pathways and 15 genes (p-value = 0.0179). The 102 PDAC 
patients are divided into long- and short-survival groups based on the 4 pathways and 15 genes. The survival probabilities of these two groups 
are compared using the logrank test. The p-value of 0.01 79 means the two groups are well separated and our finding of 4 pathways and 
15 genes is significant. 



DNA-dependent transcription pathway could regulate 
the cell growth through regulating KRAS pathway. These 
findings demonstrate that these survival-associated 
genetic signatures and pathways could be useful biomar- 
kers for early cancer detection and diagnosis and help 
pancreatic cancer researchers to grade the cancer stage 
and select appropriate therapies to prolong the patient's 
survival time at different stages. 

This work is the first attempt to infer the pancreatic 
cancer survival-associated signaling pathway sets and 
genetic signatures within those pathways using statisti- 
cal techniques. However, any statistical findings need to 
be tested by the further clinical and wet lab experiments 
of pancreatic cancer. We are unable to test our results 
with other independent datasets in this paper due to 
the data source limitation. We do expect our results 
can get verified or falsified by further investigation. We 
hope the genetic signatures and pathways found in this 
paper could help cancer researchers design new strate- 
gies for the early detection and diagnosis and lead to 
effective treatments and immunotherapies for pancreatic 
cancer. 
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